I love Star Wars. I love the story telling and fantasy, but I especially love the music. John Williams is amazing. There was a podcast out there called Star Wars Oxygen that covered the music of Star Wars and it was one of my favorite podcasts of all time. Jimmy Mac hosted while voice actor, musician, and composer David W. Collins broke down the scores for the films we know and love in a way that gave me a new appreciation for the films. I say there was a podcast because the podcast went dark following the release of Rogue One. After 38 wonderful volumes the podcast just wasn’t updated any more and we the fans have not heard anything about why they stopped producing the show.
I also love statistics and ecology, which is the study of how organisms relate to each other and their environments. One exciting area of research deals with species diversity, which is how many species are found in and among sites. We can use statistics to figure out how many things live in a certain area and compare how different habitats are similar or different from one another. In order to conduct an analysis like this you need a count matrix, which has habitats in the rows and species in the columns. The cells are filled in with counts of how many of each species is found in each habitat. An example of a count matrix could look like this, where butterfly species are in columns and different habitats are in rows:
Table 1: Example of a count matrix where each row represents a habitat and each column represents a species. The cells are filled in with counts of the number of each species observed at each habitat.
| Danaus plexippus | Vanessa cardui | Adelpha bredowii | |
|---|---|---|---|
| Donner Pass | 5 | 6 | 0 |
| Sierraville | 4 | 2 | 2 |
| Davis | 0 | 0 | 3 |
In this example, we can see that Donner Pass and Sierraville are similar to each other for two species. Also Davis and Sierraville are somewhat similar to each other because they have one species in common. If we were going to group these sites based on similarity, Donner Pass and Sierraville would be more similar to each other than to Davis.
If we plot these relationships (after applying some statistics) in the form of a “tree”, where similar habitats are connected by a “branch,” we see that Donner Pass and Sierraville are most similar to each other (they are connected by a branch). This makes sense, because Donner Pass and Sierraville are about 50 km apart, while both sites are about 160 km from Davis.
Figure 1: Cluster plot of the toy example referred to above.
A short aside, I have created this page using the program R and the RMardown package in RStudio. For those curious, all of the code and data used to create this exact post are freely available through this project’s github repository.
During the Star Wars Oxygen podcast, David W. Collins began what he called his “theme tracker,” which was essential a spreadsheet of the number of times a theme played per film.
David W. Collins made a count matrix.
We can use statistics on a count matrix.
We can apply statistics to Star Wars!!!! Oh happy day!!!
To reverse engineer the theme tracker I listened back through all of the Star Wars Oxygen episodes with pencil and paper ready. I made note of how often a theme was played during a particular film every time Mr. Collins mentioned it. In some instances, I had to get a bit of help so I watched the films and made notes of all the times I heard a theme. I also read the breakdowns and threads from these sites:
This was especially helpful when going through Attack of the Clones, which had a lot of music edits.
I then attempted my own impression of David W. Collins and Star Wars Oxygen and went through Rogue One three times and counted each instance of what I thought was a “theme.” At the time of writing this (2017-12-22) I have seen The Last Jedi four time and in the last two viewings I took a piece of paper and a pencil with me to note every time I hear a theme. I am almost certainly wrong because I am not a trained musician and I might have considered themes to be separate entities when they were they were actually part of the same leitmotifs. Regardless, the number I present here are based on the work of other and myself and I stand by them.
The data I ended up with, and which are used here, had:
These data could be incomplete and are in need of improvement. I am particularly concerned by the lack of “rare” themes in the data set. Rare things can be important in ecology but won’t have a big impact on the similarity of music used inthe movies. I could still use some help! Please contribute to the theme tracker. There are a few ways you could contribute:
github (for those with technical skills).Let’s make a histogram where the total number of appearances each theme makes in the saga is plotted.
Why did I make this plot?
Well, seeing how many times appears in all of the Star Wars films can give us an idea of what the major themes are in the series. For example, The Force / Obi Wan Kenobi’s theme is used 135 time in the films; which is 42 times more than we hear the Main title / Luke’s theme. To really investigate the plot I made, hover your cursor over each bar to see what it represents.
Figure 2: Plot of all theme appearances
It may also be informative to look at the distrbiution of themes within each film. I’ve made a plot where each film is represented by a bar and that bar is filled according to the frequency of the themes in that movie. To explore this figure, hover your cursor over a bar to see the theme and number of times it appeared in that film. Try clicking on compare data on hover to see all the themes at once. The color for each theme is consitent across all the films.
Figure 3: Themes by film
As I suspected, The Last Jedi has the most number of thematic appearances at over 150! I bet if we look at the diversity metrics below the same will be true for the total number of themes present in the film.
Now we’ll make a tree depicting the relationships between the seven films of the Star Wars saga just as we did in the toy example above.
A prediction on the clustering analysis. I postulate that the three original trilogy films will cluster together separate from the prequel trilogy films (which will also cluster together). I also predict that The Force Awakens and Rogue One will be more similar to the original trilogy than the prequels.
Adding the data from Rogue One allows us to see where that film lies in relation to the others. Michael Giacchino rooted the music for Rogue One firmly within Star Wars. He used parts from A New Hope to form the themes used in Rogue One, for example Jyn Erso’s Suite was based on “the Message,” which plays in the background when Obi-Wan says “You must learn the ways of the Force….” It is also the only Star Wars film to share “Darth Vader’s” theme with A New Hope.
I have also added data from two viewings of The Last Jedi, which I think is a masterful score. I picked up on themes for Finn and Rose, as well as the return of both of Kylo Ren’s themes, Rey’s theme, Poe’s theme, and the March of the Resistance. Not going to lie, I choked up when I heard Luke and Leia in the final scene.
Figure 4: Clustering of the Star Wars films based on the their musical theme counts.
This plot shows that the prequel trilogy films do indeed cluster together, and that the original trilogy films cluster together. When I include the data from The Last Jedi we see it come out right next to The Forece Awakens, suggesting that the new trilogy films have a lot in common musically. Notice that Rogue One is right there in between the prequel trilogy and the original trilogy? It’s almost like that film is its own thing, which makes a lot of sense to me because it shares some themes with the original trilogy but is really a uniquely scored film.
This metric is a way of counting how many things there are in a certain habitat. The cool thing about Jost’s D is that you can consider how many things there are while accounting for how rare they are (that is the q on the bottom of the plot. Here we count the number of different themes by film and consider how many different themes there are if we weight “rarity.”
Figure 5: Plot of the effective number of themes by Star Wars film
To read this plot we look at the y (vertical) axis to see the number of themes. The Greek letter alpha (\(\alpha\)) is the statistical designation for “unique things.” Along the x (horizontal) axis we have the different weights we place on “rarity,” the q that I mentioned above. A weight of 0 means that all themes are equal and it represents the total number of themes present in each film. As we move right along the x-axis we decrease the number of themes because we give them less weight. All the way to the right (q = 5) we hardly consider the effect that rare themes have on the number of themes.
Note that The Empire Strikes Back actually has the fewest total number of themes (when q = 0) at 8, followed by A New Hope with 9. When we get to The Last Jedi there are 18 different themes that appear in the film! Rogue One actually have the highest total number of themes, but when we care less about rare themes (q = 5) The Last Jedi is a bit higher. One thing that appears evident from this analysis, is that all Star Wars films have ~5 themes that we hear frequently in each film.
One last note of geekery. The colors from that plot were made with an R package called spaceMovie, which uses colors from the Star Wars franchise.
Lastly, I employ another method of visualization called NMDS (Non-Metric MultiDimensional Scaling) which plots the locations of each “habitat” in ordination space. In this case, each film appears on the plot in a place relative to the other films. That is to say, similar things should be closer together than dissimilar things.
Figure 6: NMDS Ordination plot of the Star Wars films.
## Error in Fortran routine computing the spanning ellipsoid,
## probably collinear data
Think about which films you could draw an ellipse around without including any other films. We could have the computer draw an ellipse around the prequel trilogy so that it only contains the prequels. This suggests that the prequel films are more similar to each other than they are to other films. It also appears the we can have the computer draw an ellipse around the original trilogy. Lastly, The Force Awakens and The Last Jedi are off by themselves and I predict that once Episode IX comes out it will be over there with them (the computer can’t draw an ellipse around two points). As in the clustering analysis, Rogue One is off doing its own thing. All together, these findings are consistent with the clustering plot we saw earlier.
I have four big takeaways about the music of the Star Wars films based on this exercise:
These results make a lot of sense to me. I interpret these results to mean that John Williams kept similar themes throughout each of the two trilogies, and that the new trilogy is building off of the original trilogy.
Prior to The Last Jedi, I predicted that Episode VIII would appear closely related to The Force Awakens. I’m glad to see that I was right. If you don’t believe that I predicted this, go through the “history” in the code repository that houses this page and see for yourself. Lastly, Michael Giacchino used themes found in A New Hope to ground Rogue One in the Star Wars musical universe, but made it his own.
In case you didn’t want to follow the links to find the data I used for this post, below is copy:
| Main_title | Force_theme | Vaders_theme | Leia_theme | Death_star | Rebel_fanfare | March_resistance. | Han_Leia | Reys_theme | Imperial_march | Kylo_1 | Kylo_2 | Poes_theme | Falcon_theme | Scherzo | Jedi_steps | Battle_heroes | Emperor_theme | Across_stars | Greivous_theme | Arena_monsters | Trade_federation | Anakin_theme | Yoda_theme | Duel_fates | Droid_Empire | Jaba_theme | Luke_Leia | Droid_Jedi | Jar_Jar | Qui_Gon | Jangos_escape | Separatist_conspiracy | Camino | Tusken_slaughter | Rogue_theme | The_Message | Jyns_theme | Krennics_theme | Guardian_Whills | Jedha_Saw | Battle_preparations | The_rebels | Rebel_action | Troopers_moving | Master_switch | Scarif_battle | Hope | Rose_theme | TIE_fighter_attack | Snoke | Finn | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| EI | 6 | 11 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 6 | 0 | 0 | 0 | 1 | 3 | 1 | 1 | 0 | 1 | 0 | 0 | 2 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| EII | 3 | 8 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 25 | 0 | 6 | 2 | 3 | 2 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 11 | 6 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| EIII | 4 | 20 | 0 | 2 | 1 | 3 | 0 | 0 | 0 | 13 | 0 | 0 | 0 | 0 | 0 | 0 | 8 | 6 | 6 | 3 | 2 | 1 | 1 | 1 | 2 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| EIV | 17 | 18 | 17 | 8 | 7 | 13 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
| EV | 28 | 14 | 0 | 3 | 0 | 6 | 0 | 19 | 0 | 37 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 14 | 0 | 11 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| EVI | 18 | 19 | 0 | 4 | 0 | 9 | 0 | 9 | 0 | 20 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0 | 0 | 2 | 5 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| EVII | 13 | 13 | 0 | 4 | 0 | 8 | 5 | 3 | 25 | 2 | 10 | 2 | 4 | 8 | 3 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| EVIII | 2 | 25 | 0 | 10 | 0 | 9 | 10 | 2 | 23 | 1 | 12 | 13 | 4 | 0 | 0 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 17 | 3 | 7 | 9 |
| ROne | 2 | 7 | 2 | 0 | 3 | 6 | 0 | 0 | 0 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 2 | 23 | 20 | 7 | 9 | 4 | 13 | 3 | 1 | 4 | 10 | 3 | 0 | 0 | 0 | 0 |